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ABSTRACT 

The Kepler spacecraft has coUected data of high photometric precision and cadence almost con- 
tinuously since operations began on 2009 May 2. Primarily designed to detect planetary transits 
and asteroseismological signals from solar-like stars, Kepler has provided high quality data for many 
areas of investigation. Unconditioned simple aperture time-series photometry are however affected 
by systematic structure. Examples of these systematics are differential velocity aberration, thermal 
gradients across the spacecraft, and pointing variations. While exhibiting some impact on Kepler's 
primary science, these systematics can critically handicap potentially ground-breaking scientific gains 
in other astrophysical areas, especially over long timescales greater than 10 days. As the data archive 
grows to provide light curves for 10^ stars of many years in length, Kepler will only fulfill its broad 
potential for stellar astrophysics if these systematics are understood and mitigated. Post-launch de- 
velopments in the Kepler archive, data reduction pipeline and open source data analysis software 
have occurred to remove or reduce systematic artifacts. This paper provides a conceptual primer for 
users of the Kepler data archive to understand and recognize systematic artifacts within light curves 
and some methods for their removal. Specific examples of artifact mitigation are provided using data 
available within the archive. Through the methods defined here, the Kepler community will find a 
road map to maximizing the quality and employment of the Kepler legacy archive. 
Subject headings: Kepler, data reduction, data analysis techniques 



1. INTRODUCTION 

The Kepler spacecraft was launched on 2009 March 6 
with a potential operational lifetime limited by its ap- 
proximately 10 year supply of propellant (Koch et al. 
2010). Kepler's primary objective is to determine the fre- 
quency of Earth-sized planets within the habitable zone 
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of solar-like stars, achieved by detecting planetary tran- 
sits of stars within high-precision time-series photome- 
trv (IBorucki et al.llmalKoch et aLllMllCaldweh et al.l 
I2010D . Transit durations typically last a few hours, and 
they are separated by intervals of days to years. The 
Kepler mission collects data mostly on a 30-minute ca- 
dence near-continuously with > 92% completeness. De- 
tection of transits by small p lanets requires part s-per- 
million photometric precision (j Jenkins et al.|[200^ . The 
primary objective will be realized through the combina- 
tion of space-based photometric sensitivity, regular ob- 
serving cadence, a 116 square degree field-of-view con- 
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tainin g a large number of target stars, and high duty 
cycle (|Koch et al.ll2010D . 

One of the primary Kepler legacies is a commu- 
nity archive containing the multi-year, time-series pho- 
tometry of 1.5 X 10^ astro physical targets obs erved 
for the planetary survey (iBatalha et al.l l2010t ) and 
community-nominated targets of other astrophysical in- 
terest. In addition to exoplanet science, Kepler pro- 
vides unique datasets and raises s cientific potential in 
fields such as astcroscismology (e.g. Bedding et al. '2011"; 
IChaplin et al. 2011; Antoci ct al. 2011; Beck et al. 2011), 
gyrochronology (Mcib om et al.l I201ll) . stellar a ctiv- 
ity (e.g. Basri et al.. ,20111: iWalkowicz et al.l 1201 Ih. bi- 
nary stars (e.g . Carter et al.l"201l{"Derekas et al.l 120111 : 



iSlawson et al.l 12011': 'Thompson et al.l l2012D . and active 
galactic nuclei (Mushotzky ct al.. l2011t ). Our aim for this 



paper is that it is used as a reference guide for the astro- 
nomical community who wishes to use Kepler data. This 
paper will address how the community can mitigate and 
exploit the public data for stellar and extragalactic sci- 
ence. Many of the techniques presented here are applied 
to archived light curves and target pixel files and will 
help optimize the data for astrophysical research. 

An understanding of the nature of archived data is 
critical for the effective exploitation of the Kepler legacy. 
Kepler provides high-precision photometry on the 1 and 
30 minute timescales. The data also contain artifacts 
that occur through spacecraft operation events and sys- 
tematic trends over longer timescal es as a natura.1 con- 
sequence of mission design ( Van Cleve fc Caldwelll 120091 : 
iChristiansen et al]|2011f) . Artifacts can both mask as- 
trophysic al signal and be misinterp reted as astrophysical 
in origin ([Christiansen et all 1201 If) . Furthermore, cav- 
alier approaches to artifact mitigation (e.g. using sim- 
ple function fits to time-series data) can destroy astro- 
physical signal. Archived Kepler data h ave been pro- 
cessed through a data reduction pipeline (jJenkins et al.l 
l2010bD . The pipeline function s include pixel- level cal- 
ibration (lOuintan a et al.l 120101) . simple aperture pho- 
tometry (IT wickcn et al.ll2010al ) and artifact mitigation 
(jTwicken et al.l I2010bl ). This third step of artifact re- 
moval will always be a subjective process. An archive 
user can either work with the default pipeline correction 
to be of suitable quality to enable their scientific objec- 
tives or choose to perform artifact removal themselves, 
starting from either the calibrated pixels or aperture pho- 
tometry. 

In this paper we provide a guide to Kepler data, insight 
into the nature of systematic artifacts, a description of 
how to remove them manually to best effect, and work- 
ing examples using open source data analysis software. 
In the next section, we list the available data and docu- 
mentation. We describe pixel level data in Section 3 and 
how they can be employed to construct new light curves 
and mitigate systematic artifacts. Section 4 focuses on 
the exploitation and limitations of Kepler's aperture pho- 
tometry products. In Section 5, we introduce cotrending 
basis vectors, which can be used to remove instrumen- 
tal artifacts from time-series aperture photometry. The 
related problem of stitching multiple quarters of data to- 
gether is discussed in Section 6. The appendix of this 
paper contains three worked examples of how to miti- 
gate specific common issues with archived Kepler data. 



We conclude the paper with a listing of helpful resource^B 
for pursuing further artifact mitigation. 

2. KEPLER SUMMARY AND RESOURCES 

In order to maintain long-term stable pointing upon a 
single field, the Kepler spacecraft is in an Earth-trailing, 
372 day heliocentric orbit. Kepler's 0.95-m aperture 
Schmidt telescope carries a photom eter with an array 
of 42 CCD chips (iKo ch et al.ll20Tol ). Each CCD has a 
direct neighbor, and collectively they are referred to as 
a "module", of which there are 21. Each module has 
4 output nodes. Therefore the CCD array and targets 
falling upon silicon are often mapped according to mod- 
ule and output numbers. Alternatively, mission docu- 
mentation also refers to output nodes by "channel" num- 
ber, which ranges from 1-84. The module, output and 
channel locations are provided for each target within the 
archived meta-tables at the Mikulski Archive for Space 
Telescopes (MASTfl The K epler Instrument Handbook 
(I Van Cleve fc Caldwelll l2009l) maps module, output and 
channel numbers to the detector array. The field of view 
and pixel scale were designed to maximize the number 
of resolvable stars brighter than Kn = 15. Kn r efers to 
an AB mag nitude (l(^IT97l IBrown et al.ll20lTh across 
the Kepler 425-900 nm bandpass (|Van Cleve fc Caldwell 
12001 . The Kp magnitude is composed of the star's 
calibrated g, r, and i magnitudes, obtained during the 
pre-launch ground base d survey that con structed the 
Kepler Input Catalog (jBrown et al.l 1201 It ). The Ke- 
pler field of view spans 115.6 square degrees over 94.6 
miUion 3.98 x 3.98 arcsec detector pixels, with 95% of 
the encircled energy is c ontained within 3.14-7.54 pix- 
els (jVan Cleve fc Caldw ell 2009). The Kepler field con- 
tains 10 million stars brighter than the magnitude limit 
of Kp = 20-21. The camera takes one 6.02s image across 
the full field every 6.54 s. Exposures are summed on- 
board and stored at either long cadence (29.4 min; see 



Jenkins et a l. (2010a)) or short cadence (58.85 s; see 



Gilliland et al.l (|2010l )). Science data are downloaded ap- 



proximately once per month when Kepler leaves the field 
temporarily to p oint its high-gain antenna towards Earth 
(iHaas et al.ll2010f ). The number of pixels collected and 
transmitted is a trade-off between maximizing the num- 
ber of targets delivered and minimizing the length of the 
data gaps. For long cadence observations, a maximum of 
5.4 X 10^ pixels a re stored onboard in t he spacecraft Solid 
State Recorder () Jenkins et al.ll20"l0b[ ). and the number 
of targets can range from 150,000 to 170,000. Short ca- 
dence data are limited to 512 targets. Download requires 
approximately a 24-hour hiatus in data collection. 

The long and short cadence pixels equate to less than 
6% of the detector plane, and the remaining pixel data 
are not stored. The stored pixels are chosen strategi- 
cally to provide postage sta mp images centered on the 
positions of Kepler targets (jBatalha et al.l [20101 ). The 
size of a postage stamp increases with target brightness, 
and the yield lies typically between 163,000-170,000 tar- 
gets per month. The critical concept for understand- 
ing instrumental artifacts is that in order to maximize 
the number of targets collected, the postage stamp sizes 

^ Kepler archive users can fin d documentation, data analy sis 
software and helpdesk support at http: / /keplergo.arc.nasa.gov| 
^ http:/ /archive. stsci.cdu/kepler 



Kepler data guide 



3 



and shapes are chosen to maximize the per-target pho- 
tometric signal-to-noise on the 3-12 hour timescales of 
exoplanet transits. The postage stamps do not contain 
all the flux from a target because the collection of the 
target's faint PSF wings degrades signal-to- noise by the 
inclusion of more sky background. The pixels within a 
postage stamp are defined by a calculation that combines 
the photometr y and astrornetry w ithin the Kepler Input 
Catalog (KIC;|b rown et aLll2011|) and an a nalytical pixel 
response model for the detector and optics (jBrvson et al.l 
|2010a). The postage stamps are fixed within the pixel 
array; they do not evolve over a 93-day observation pe- 
riod, or "quarter". A new target list is uploaded to the 
spacecraft after each quarterly roll. Changes in the tar- 
get list occur due to new detector geometry, operational 
developments to the exoplanet survey, and community- 
led science programs. Any time-dependent variation in 
the position of the target or the size of the point spread 
function will result in a redistribution of flux within the 
postage stamp pixels. The spacecraft pointing stabil- 
ity is good to typically 20 milliarcsec over 6.5 hours but 
the high precision light curves can contain systematic 
noise that manifests from thermally-driven focus vari- 
ations, pointing offsets, a, i id dif f erential velocity aber- 
ration ([Christiansen et all 120111: iVan Cleve fc Caldwelll 
120091) . Many of the systematics within the archived light 
curves are the result of time-dependent light losses as 
the target wings fall out of the pixel apertures and time- 
dependent contamination by neighboring sources falling 
into the pixel apertures. All collected data are stored and 
propagated to the community by the MAST. Technical 
manuals and reference material describing the mission, 
its data products and defining mission-specific terminol- 
ogy are also archived at the MAST. 

• Kepler Instrument Handbook - describes the 
design, operation, and in-flight performance of 
the Kepler spacecraft, tele scope, and detector 
(|Van Cleve fc CaldweUll2009l) . 

• Kepler Data Processing Handbook - pro- 
vides a description of the algorithms and pipelined 
data processing p erformed upon collected data 
(iFanelh et al.l[20Tl . 

• Kepler Archive Manual - describes the content 
and format of archived Kepler data products and 
the avail able archive search a. nd retrieval resources 
fFraaucU rfc Thompsonll2011| ) . 

• Kepler Data Characteristics Handbook - de- 
fines the causes and provides illustrative exam- 
ples of characteristics within the Kepler time- 
series data and systemat ic artifacts found therein 
(jChristiansen et al1l2011h . 

• Kepler Data Release Notes - provide an impact 
assessment of systematic artifacts and spacecraft 
events upon Kepler time-series data. 

Within the MAST archive, Kepler data are stored as 
files designed to the format and conv entions of the Flex - 
ible Image Transport System (FITS: iPence et al.|[20Tol ). 
Short cadence target data files each contain one month 
of observations, and long cadence data files contain one 



quarter of observations. Instructions for the MAST data 
search and retrieval tools are provided in the Kepler 
Archive Manual. The three primary forms of Kepler sci- 
ence data stored within the archive are: 

• Full-frame images (FFIs) - using all the pix- 
els on each detector channel, the Kepler spacecraft 
accumulates one 29.4 min image comprised of a se- 
quence of 270 consecutive 6.02s exposures coadded 
onboard, once per month, before each data down- 
load. The 84 channels are stored within a single 
FITS file. FFIs are collected primarily for engi- 
neering purposes but provide a scientific resource 
in their own right - high-precision photometry of 
the entire Kepler field of view on a 1-month ca- 
dence. Additionally, FFIs can be employed to as- 
sess the target's fiux level and sources of contami- 
nation from nearby objects within a target's pixel 
aperture. 

• Target Pixel files (TPFs) - each TPF stores 
a time-stamped sequence of uncalibrated and cali- 
brated postage stamp pixel images of a Kepler tar- 
get over a long cadence quarter or short cadence 
month. The TPF content is discussed further in 
the context of systematic artifacts in Section 3. 

• Light curve files - each file contains time-series 
photometry for an individual Kepler target, derived 
from an optimized subset of pixels contained within 
the associated TPF. A more detailed examination 
of these files and the consequences of the systematic 
effects within the TPFs are provided in Section 4. 

3. KEPLER ARCHIVED TARGET PIXEL FILES (TPF) 

For each individual Kepler target, light curve files (dis- 
cussed in Section 4) are accompanied by a TPF. The TPF 
is the single-most informative resource in the archive for 
understanding the instrumental, non-astrophysical fea- 
tures within a target light curve. Consequently, it is 
recommended that users always examine a TPF in con- 
junction with its light curve. They provide the detector 
pixel and celestial coordinate mapping of a target mask 
and its subset containing the optimal aperture. TPFs re- 
veal the motion of a target across the optimal aperture, 
and the motion of nearby contaminating sources. 

As with the light curve file, the TPF content is orga- 
nized by timestamp. The timestamps are the barycen- 
tric Julian date at the midpoint of each accumulated 
exposure. Pixel data are presented as a time-series of 
photometrically calibrated images, one image per times- 
tamp, all located within the first extension of the FITS 
file. Typical Kepler targets {Kp > 12) will contain 10- 
50 pixels, but bright sources will include a larger pixel 
set. A typical quarter will contain approximately 4,300 
collected images in long cadence for each target, while a 
typical month will yield 43,200 images in short cadence 
mode. For each timestamp within the TPF, there is an 
image of the uncalibrated pixels collected around a tar- 
get. This group of pixels is referred to as the target mask, 
which contain pixels assigned either to the optimal aper- 
ture or a halo around it. The optimal aperture is the set 
of pixels over which the collected fiux is summed by the 
Kepler pipeline to produce a light curve. The halo pixels 
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are those pixels surrounding the optimal aperture pix- 
els, which are used for calibration purposes and provide 
operational margin. 

Additionally, for each timestamp, the TPF includes 
the raw counts (FITS column labeled "RAW_CNTS") 
a fully calibrated postage stamp pixel image (FLUX) 
which incorporates bias correction, dark subtraction, 
flat fielding, cosmic ray removal, gain and nonlinear- 
ity corrections, smear correction (the Kepler photome- 
ter has no shutter), and local detector electronic un- 
dershoot (i.e. sensitivity of the pixel respo nse to bright 
objects). The Data Proce ssing Handbook (|Fanelli et al.l 
1201 ID and iCaldwell et"al] pOlQ) contain further details 
of these corrections. The TPFs also provide images 
for each timestamp containing the l-cr flux uncertain- 
ties of each pixel (FLUX_ERR), a calibrated sky back- 
ground (FLUX_BKG), l-cr uncertainties to the sky back- 
ground (FLUX_BKG_ERR) , and cosmic rays incidences 
(COSMIC-RAYS). AU of these postage stamp images 
are found in the first extension of the FITS file. Sky 
background generally cannot be well-estimated from the 
TPFs themselves because target masks do not encompass 
enough sky for a good background estimation. Instead 
4,464 pixels across each channel are recorded at long ca- 
dence specifically to measure the sky background. Those 
measurements are interpolated across each target mask 
to characterize the local sky background for each aper- 
ture. 

Each timestamp has a quality flag coupled to it which 
alerts the user to phenomena and systematic behavior 
that may bring the quality of the photometric mea- 
surement within that timestamp into question. Detri- 
mental behavior is generally associated with one of the 
following: i) physical events such as cosmic rays fol- 
lowed by short term detector sensitivity dropouts, ii) 
foreign particles such as dust, iii) spacecraft motion 
due to attitude-control reaction wheel resets, zero-torque 
crossing events, spacecraft pointing off'sets and/or loss 
of fine-pointing upon guide stars, iv) time-dependent 
variation in incident solar radiation and telescope fo- 
cus (caused by either Kepler's orbit, autonomous opera- 
tional commands directing the spacecraft to re-point to- 
wards a safe direction, or pointings towards Earth for 
the transmission of data), and v) differential velocity 
aberration (DVA) which is caused by the spacecraft's 
orbital motion constantly changing the local pixel s cale 
and field distortion. See Christian sen et al.l (j2011| ) for 
more detailed descripti o ns of each event listed above and 
iFraauelli fc ThompsonI ()2011l ) for a list of quality flags in 
the TPF and its corresponding events. Note that the list 
provided here contains only the major causes of artifacts 
common to many operational quart ers of data. Other 
events do exist and are described in iChristiansen et aD 

dMil). 

A pixel bitmap indicating the use of each pixel within 
the target mask in the Kepler pipeline is stored as an 
image in the second FITS extension of the TPF. The 
target masks do not track and follow stellar motion, and 
the large pixel scale undersamples the point spread func- 
tion. Optimal apertures in general cannot provide abso- 
lute photometry because target flux is always lost out- 
side the pixel borders and contamination from nearby 
objects falls inside the collection area. Spacecraft motion 
and focus variation are detrimental to Kepler photome- 



try because time-dependent variation in either property 
results in different fractions of target light being lost from 
the aperture and different amounts of source contamina- 
tion falling into the aperture. Despite Kepler's cadence- 
to-cadence pointing generally being stable at the milli- 
arcsecond level, motion larger than this threshold has 
a measurable consequence at the few 10~^ photometric 
accuracy. Kepler simple aperture photometry is conse- 
quently a combination of astrophysical signal from the 
target and systematics from the spacecraft and its en- 
vironment. The more precise the scientific requirement, 
the more care one must take to achieve accurate photo- 
metric results. 

Using the motion across the detector of a set of ref- 
erence stars, the Kepler pipeline predicts the motion of 
the target over time and provides a predicted position 
for each timestamp within the TPF (the FITS columns 
labelled POS.CORRl and P0S_C0RR2). Predictions 
of the target motion trace many of the Kepler system- 
atics, and this measured astrometric deviation from the 
reference star predictions are likely to be astrophysical 
in nature. For example, as a target's brightness varies, 
the centroid of the flux distribution across the pixels will 
move if there are contaminating sources in the target 
mask's pixels. The change in the flux centroid position 
is a method that can help detect faint, unresolved back- 
ground binaries or other variable stars, which can appear 
as a false detection of planetary transit, (e.g. see Ap- 
pendix A). 

Figure[I]provides a typical example of a Kepler long ca- 
dence light curve, speciflcally the quarter 2 simple aper- 
ture photometry of the eclipsing binary star V1950 Cyg. 
In this example, the flux affected by short-term system- 
atics are dominated by the high-amplitude intrinsic vari- 
ability of the target, but the effects of DVA still clearly 
manifest as a 3% decrease in target flux over the duration 
of the quarter. The situation is clarified by inspection of 
the associated TPF. Figure [H reconstructs the photom- 
etry for each individual pixel within the target mask of 
the source of interest. The mask includes both the star 
itself and the halo pixels around the star. The pixels in 
Figure [2] with gray backgrounds are chosen by the Ke- 
pler pipeline to be the photometric optimal aperture for 
the archived light curve. The other pixels in the halo are 
associated with the target mask but they play no part 
in the calculation of the archived time-series photome- 
try. The wings of the point spread function extend into 
many pixels surrounding the optimal aperture. Kepler 
apertures, in general, are designed to maximize signal-to- 
noise which usually undercaptures target flux. Without 
full capture of a source, motion and focus-induced arti- 
facts in the summed light curve are inevitable. In Figure 
[21 as the quarter proceeds, note how the flux captured in 
each pixel can increase or decrease as the target moves 
across the pixel array. This example shows that the flux 
is continuously being redistributed between neighboring 
pixels of the optimal aperture, and different amounts of 
flux will fall outside this aperture over time. The times 
and nature of events related to the discontinuities seen in 
the TPF light curves are recorded under the QUALITY 
column in the TPF FITS headers. 

The example of V1950 Cyg illustrates how Kepler light 
curves can easily be affected by instrument systematics 
(e.g., DVA). There are instances where extracting target 
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pixels over a larger detector area compared to the optimal 
aperture chosen by the pipeline minimizes or removes the 
instrument effects from the light curves. However, the ar- 
tifact mitigation of including more pixels comes at a cost. 
More pixels within the flux summation decreases the pho- 
tometric signal-to-noise by adding more sky background 
from the additional pixels. The user must decide whether 
adding more pixels is an acceptable mitigation for the 
systematics. Many Kepler targets exist within crowded 
fields, and multiple nearby sources may contribute to the 
flux within the target mask. While it is sometimes ben- 
eficial to include more pixels within a modified optimal 
aperture, in the worst case scenario, the systematic errors 
in light curves and source contamination can increase sig- 
nificantly by including more pixels in extraction. 

Inspection of the time-series images within the TPF 
can help reveal whether photometric variability is due to 
source contamination or is intrinsic to the target. An 
example of a background object contaminating the light 
curve of a target is provided in Appendix A. 

4. KEPLER LIGHT CURVE FILES 

The Kepler archive stores target specific light curve 
files in binary FITS format that have been derived from 
each TPF. The forma t of th e FITS file is defined in 
iFraquelli fc Thompson! (|2011[ ). The time stamps, qual- 
ity flags, predicted target motion relative to the detector 
pixels, and pixel bitmap information described in Section 
3 are copied into the light curve FITS tables. Kepler 
light curve files contain a number of columns containing 
flux information. Two columns contain simple aperture 
photometry (SAP) flux with 1-a statistical uncertain- 
ties, while a more processed version of SAP with artifact 
mitigation included called Pre-search Data Conditioning 
(PDCSAP) flux (Smith et al. 2012) and its uncertain- 
ties, populate two more columns. The sky background 
values, summed across the optimal aperture, and its 1-a 
uncertainty are calculated from the TPF and added to 
the light curve files. The last set of columns in the light 
curve FITS files are the timestampcd moment-derived 
centroid positions of the target, as calculated from the 
calibrated TPF images. The Data Processing Handbook 
(jFanelli et al. 2011) provides details on how the centroid 
positions were calculated. The centroids are provided in 
detector pixel row and column coordinates. The centroid 
positions can be used as a direct comparison to the mo- 
tion predicted from a set of reference stars per CCD chan- 
nel. The purpose of the comparison between measured 
flux centroid and the centroid predicted from the motion 
of reference stars is to identify times when these quan- 
tities are uncorrelated. Potentially, uncorrelated cen- 
troid structure identifies events in the light curve that 
are caused by fractional changes in contamination from 
sources close to or unresolved from the target. 

4.1. Simple aperture photometry (SAP) 

The SAP light curve is a pixel summation time-series 
of all calibrated flux falling within the optimal aperture, 
as stored and deflned in the TPF. The 1-a errors are cal- 
culated from standard Gaussian error propagation of the 
TPF errors through the sum. Data archive users need to 
be aware that a SAP light curve can be contaminated by 
astrophysics from neighboring sources. One can inspect 
the concurrent TPF to identify contamination. A new 



SAP light curve can be extracted from the TPF using a 
custom selection of pixels, as shown in Appendix B. 

Archive users must expect, a posteriori, that SAP pho- 
tometry is contaminated by the systematics discussed in 
Section 3. To continue using SAP data for scientific ex- 
ploitation, one must decide whether the artifacts will im- 
pact their results and conclusions. There is "low-hanging 
fruit" that has dominated Kepler astrophysics activity 
in the early phases of the mission because the SAP data 
has proved to be adequate for specific science goals with- 
out artifact mitigation. For example, asteroseismology of 
solar-like oscillations, S Scuti and 7 Doradus pulsations 
have been hugely successful because signals of frequency 
> l-d ~^ are mostly unaffected by t he majority of arti- 
facts fBalona fc Dziemb owskil [20111 : lUytterhoeven et al.l 
jlOll; Balona et al. 2011). Some high frequency artifacts 
that could prove problematic to these programs can be 
filtered out of the time-series using the quality flags pro- 
vided. Data analysis of cataclysmic variables, RR Lyr 
stars and Cepheids are just as successful. While many 
of the astrophysical frequencies of interest in these pul- 
sators can be longer than a few days and similar to the 
thermal resettling times of the spacecraft after a pointing 
maneuver, the large amplitude of target variability domi- 
nate over systematics t hat can consequently be neglected 
e.g. IStiU et al.l ((20Toh : lNemec et al.l ([20ll : ISzabo et al.l 

mm- 

There are many astrophysical applications that are less 
likely to benefit from direct employment of SAP data. 
These include any science relying on more subtle light 
curve structures and periods longer than a few days, in 
which case the systematics discussed are more likely to 
be significant. Investigations of magnetic activity, gy- 
rochronology, binary stars and long period variables must 
scrutinize the SAP data with great care before proceed- 
ing and will most likely benefit from one of three available 
artifact mitigation methods. The three methods are to 
use archived PDCSAP photometry (Section 4.2), to re- 
extract the SAP light curve over a larger set of pixels 
(Appendix B), or to perform a custom correction on the 
archived SAP data using cotrending basis vectors (Ap- 
pendix C). These methods and their precise application 
are very subjective. The highest quality Kepler research 
will in most cases result from the experience and under- 
standing gained by applying all three of these methods 
to the archived data. 

4.2. Pre-search data conditioning simple aperture 
photometry (PDCSAP) 

The PDCSAP data included within the archived light 
curve files are produced by a pipeline module that re- 
mai ns under continuing development at the time of writ- 
ing fS mith et alll2012l: IStumpe et al.ll2012l) . Versions of 
PDCSAP artifact mitigation algorithms (before Novem- 
ber 2011) employed the removal of analytical functions 
while correlating the photometry with spacecraft diag- 
nostic information such as the focal plane temperature. 
This approach focused upon solving the problem for the 
effective detection of exoplanet transits, without discrim- 
ination removing both systematic and astrophysical vari- 
ability that would interfere with transit detection. PDC- 
SAP data provided before November 2011 should not be 
used without skepticism for any purpose other than tran- 
sit detection. It is recommended that this older version 
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of PDCSAP data not be used for stellar or extragalac- 
tic astrophysics. All such data have been reprocessed by 
the pipeline with different artifact mitigation algorithms 
which are more robust for astrophysics and re-delivered 
before May 30, 2012. 

Post October 2011, quarterly data deliveries to the 
archive were constructed using versions of the pre-search 
data conditioning (PDC) pipeline module developed to 
better remove systematic artifacts while conserving more 
astrophysical signal. This development provides a change 
of approach and comes in two parts. In the first part, sys- 
tematic artifacts are characterized by quantifying the fea- 
tures most common to hundreds of strategically-selected 
quiet targets on each detector channel. For each chan- 
nel and each operational quarter, this characterization 
is stored as 16 best-fit vectors called "Cotrending Basis 
Vectors" (CBVs). The basis vectors archived represent 
the most common trends found over each channel. The 
CBVs are ranked by order of the relative amplitude they 
contribute to systematic trends across a channel. An ex- 
ample of the 8 most dominant CBVs for CCD channel 
50 over quarter 5 is provided in Figure |31 A description 
of the pip eline algorithm for con structing CBV s is de - 
scribed in lSmith et al.l (|2012l ) and lStumpe et all (|2012( ). 

In second part of the PDC pipeline module, systemat- 
ics are removed from SAP time-series by subtracting the 
CBVs. The results are stored in the archived files and 
labeled PDCSAP data. The correction is unique to each 
target. A weighted normalization for each basis vector 
in the calculation is determined by fitting basis vectors 
to the SAP data, but the CBV weighting and "best" as- 
trophysical solution remains a subjective problem. The 
process is therefore repeatable by archive users and tun- 
able. The pipeline has configured the tuning to provide 
the most effective conservation of astrophysics within the 
Kepler targets each quarter by detector channel as a sta- 
tistical sample. The pipeline algorithm therefore pro- 
vides a significant improvement in the quality of artifact 
mitigated photometry. However the PDC algorithm is 
not tuned to individual targets or specific classes of tar- 
get. As the PDC pipeline continues to mature, the num- 
ber of individual problematic cases in the archive will 
shrink. However, for any individual target, we recom- 
mend direct comparison of the three artifact mitigation 
methods available in order to understand whether the 
archived data provides a solution optimized to the users' 
scientific requirement. A manual re-extraction of a target 
light curve from a TPF will produce a SAP time-series, 
but not a PDCSAP time-series. If artifact mitigation is 
required subsequent to light curve extraction, then the 
only viable option is to manually fit the CBVs. Manual 
CBV fitting and subtraction is the subject of Section 5. 

5. REMOVING SYSTEMATIC ARTIFACTS WITH 
COTRENDING BASIS VECTORS 

The 16 most significant CBVs per channel for each 
quarter are calculated by the Kepler pipeline and pack- 
aged as FITS binary files. These CBVs are available for 
download at the MAST|3. File cont ent an d format are de- 
scribed in Fr aquelli fc ThompsonI (|2011f ). With the pro- 
vision of CBVs, the responsibility rests with the archive 
user to either improve upon the existing artifact miti- 

^ http:/ /archive. stsci.edu/kepler/cbv. html 



gation done by the pipeline or perform manual artifact 
removal from photometry re-extracted from a TPF. Ba- 
sis vectors are usually fit to the SAP light curve linearly, 
i.e. each basis vector is scaled by a coefficient and sub- 
tracted from the fiux time-series. Computationally, the 
most efficient method is a linear least-squares fit. In Fig- 
ure m we plot 16 SAP light curves from channel 50 in 
quarter 5. In Figure [31 we plot the same light curves of 
Figure ID but with the most significant CBVs fit and sub- 
tracted. We can see that in all the light curves in Figure 
El the systematic trends have been greatly reduced and 
the astrophysics is more clearly delivered. Appendix C 
provides an example of how to apply the CBVs to data. 

An important decision for the CBV user is how many 
basis vectors to fit and remove from the SAP data. Fit- 
ting too few will capture instrumental artifacts less ef- 
fectively. However, using too many can overfit the data, 
removing real astrophysical features. A further consid- 
eration is that no basis vector is perfect. The inclusion 
of each additional CBV to the fit adds a noise compo- 
nent to the data. The choice of CBV number is in reality 
a trade between maximizing the removal of systematics 
on the one hand, and avoiding the removal of real astro- 
physics and minimizing the effects of CBV noise on the 
other. A minimum of two basis vectors should be fit to 
the data because, instead of strictly enforcing a constant 
first or second basis vector, CBVs are created by mixing 
a constant offset with the strongest non-constant basis 
vector. We find that an interative method is the most 
effective approach, starting with two basis vectors and 
increasing the number of vectors monotonically until de- 
ciding upon a subjective optimal fit. Appendix C shows 
example fits of 2, 5, and 8 CBVs to demonstrate the it- 
erative method of determining the number of CBVs to 
use. 

Occasionally the linear least-squares fit is not sufficient, 
and a more robust fitting method must be utilized. One 
option is to fit the CBVs to the SAP time-series using 
an iterative-clipping least-squares method. This method 
identifies data points outside of a distance threshold from 
the best fit. Data points outside of the threshold are ex- 
cluded and the fit recast. This procedure iterates until 
no further data points are rejected and is more robust 
to outliers than a regular least-squares fit. Alternatively, 
rapid, high amplitude astrophysical variability can bias 
the goodness of fit away from the best astrophysical so- 
lution. 

Some sources of astrophysical variability, such as large 
amplitude, semi-regular variable stars, cannot be cor- 
rected satisfactorily with CBVs. Such sources which vary 
on a similar timescale to the length of a quarter are 
particularly problematic because the astrophysical sig- 
nal has the same frequency as the most dominant basis 
vector. In addition, if the astrophysics is too similar in 
structure to the trends created by differential velocity 
aberration, cotrending corrections with CBVs may not 
be adequate. In these cases, we do not recommend using 
the CBVs to mitigate for long-term artifacts. 

It should be noted that the CBV method for removing 
systematic trends relies on there being a large number of 
stars on a channel to well-describe the systematic effects 
present in the data. There are only 512 stars observed 
in short cadence mode at a given time, thus there are 
not enough stars on a single channel to fully capture 
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the systematics present on 1-min cadence. The method 
currently available for mitigating short cadence artifacts 
is to interpolate the long cadence basis vectors over the 
short cadence timestamps. Artifacts on timescales less 
than 30-min cannot be removed from short cadence data 
using the CBV method. 

6. STITCHING KEPLER QUARTERS TOGETHER 

In Section 3, we described how systematic photometric 
artifacts result directly from the motion of targets within 
their pixel apertures due to differential velocity aberra- 
tion, spacecraft pointing, and focus variations. Similar 
in nature, Kepler data will often exhibit discontinuities 
across the data gaps that coincide with the quarterly rolls 
of the spacecraft. After each roll maneuver, most Ke- 
pler targets fall on a different CCD channel and the tar- 
get's point-spread function will be distributed differently 
across neighboring pixels. Naturally this redistribution 
requires a new computation of the target mask and op- 
timal aperture size, taking into account the point-spread 
function, new CCD characteristics, and new estimates 
of nearby source crowding. The operational outcome is 
often a different mask shape, with differing amounts of 
flux within the optimal aperture from both the target 
and contaminating sources. An illustrative example of 
this problem is provided by the quarter 4-6 calibrated 
pixels collected for the symbiotic star StHA 169 (KIC 
9603833), plotted in Figure El 

6.1. Suggested stitching methods 

There are several methods available to attempt cor- 
recting for photometric discontinuities across the quarter 
gaps. None of the methods are well-suited to all occa- 
sions. Individually they can perform good corrections 
under specific conditions. We discuss the three methods 
below. 

6.1.1. Crowding and aperture flux loss adjustment 

The first method is to align different quarters using the 
time-invariant approximations for crowding and aperture 
flux losses stored within the light curve FITS flle key- 
words. These unitless keywords are stored in all Kepler 
data processed after September 2011. For earlier data 
the same quantities are populated in the meta-tables 
of the data search and retrieval page at MAST. The 
FITS keyword FLFRCSAP contains the fraction of tar- 
get flux falling within the optimal aperture. The keyword 
CROWDSAP contains the ratio of target flux relative 
to flux from all sources within the photometric aper- 
ture. Both quantities are the average value over each 
quarter and are estimated u sing point-spread fu nction 
and spacecraft jitter models (jBrvson et al.l [20101)1 ) com- 
bined with source characteristics found within the KIC 
(jBrown et al.ll20lil) . The PDC time-series data archived 
within the FITS light curve files have both of these cor- 
rections applied by default. Both corrections can be 
applied to SAP data manually using the keparith task 
within the PyKE package (see Appendices for a descrip- 
tion of PyKE). 

The limitations of this first method are two-fold. The 
corrective factors supplied are mo del- dependent. The 
characterization of the Kepler point-spread function does 
not provide the same order of photometric precision as 



aperture photometry. Furthermore, PSF modeling of the 
pixels within an optimal aperture is only as complete as 
the KIC, which is complete only at Kp < 17. Secondly, 
the corrective factors are averaged over time, whereas 
in reality they vary from timestamp to timestamp as the 
field moves within the aperture. Furthermore, as the tar- 
get and neighboring stars vary independently, the crowd- 
ing correction in reality is an additive one rather than 
multiplicative. Therefore, while these two keywords col- 
lectively provide the simplest method of quarter stitch- 
ing, the solution is often inadequate. Given that target 
images for each timestamp are provided in the archive 
within the Target Pixel Files, there is some scope for 
Kepler users to improve upon these corrections by fit- 
ting a field and point-spread function model to the in- 
dividual images within these files. The Pixel Response 
Function (the combination of point-spread function and 
spacecraft jitter) information is available at the MAST 
archive within the focal plane characteristics download 
tableQ. Fitting a PSF model can yield improvements 
over the provided correction factors because the time- 
dependent variations associated with the problem are 
mitigated, and the user can take individual care char- 
acterizing all neighboring sources contributing to flux 
within the aperture. The primary limitation remaining 
after these improved steps will be the accuracy of the 
point-spread function model. 

6.1.2. Normalization of light curves 

A second approach that can yield adequate results is 
to individually normalize light curves on either side of a 
quarter gap by a functional fit or statistical measure of 
the data. Some simple corrections by statistical represen- 
tations of the data, such as mean, median and standard 
deviation, are available through the PyKE tools such as 
keparith. This method of correction is however a math- 
ematical convenience and users should remain aware of 
the non-physical biases that they may introduce into the 
data. 

6.1.3. Using more pixels in ttie target mask 

As described in Section 3, a third method which often 
proves to be successful is increasing the number of pixels 
within the target mask using the PyKE tasks kepmask 
and kepextract. This approach will prove adequate if the 
optimal aperture can be increased to a large enough size 
as to make target fiux losses out of the aperture negligi- 
ble, while avoiding significant contamination from nearby 
sources. This third method will introduce additional shot 
noise into the resulting light curve by the inclusion of 
more source and background flux. The example of StHA 
169 over quarters 4-6 yields an adequate correction by 
this method, as demonstrated in Figure [7] 

7. SUMMARY 

This paper has provided a general description of the 
systematics contaminating archived Kepler data and an 
introduction to the mitigation of those artifacts. Three 
detailed examples of artifact mitigation are supplied 
in Appendices A, B and C. Successful mitigation is 
not guaranteed in all cases. Conceptual understanding, 

* http:/ /archive. stsci.edu/kepler/fpc. html 
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methodology, and open source software development are 
maturing such that the quality of mitigated data archive 
products continues to increase with time. Several ap- 
proaches have been developed to manually improve the 
fidelity of aperture photometry. In order to minimize the 
impact of aperture photometry artifacts and obtain the 
highest quality time-series data, archive users are recom- 
mended to explore all of the methods discussed in this 
paper. The most correct approach for individual targets 
will be subjective and will be achieved through experi- 
mentation and hands-on experience. 

To date, exploitation of Kepler data has naturally been 
dominated by it s primary mission goals of exoplanet 
transit detection (iBorucki et al.ll201Cll) and a steroseismol- 
ogy of solar- like stars (iChaplin et al.ll201l"l ). By the na- 
ture of the mission design, both research areas continue 
reaping a rich harvest from the Kepler archive. How- 
ever, there is a sensitivity threshold for both of these 
disciplines that will require more state-of-the art arti- 
fact mitigation before reaching their full potential. Simi- 
larly Kepler has the potential through multi-year, highly- 



regular monitoring to provide a startling legacy archive 
for active galactic nuclei, stellar activity, gyrochronology, 
and, perhaps most-compellingly, stellar cycles, for exam- 
ple. For much of the detailed astrophysics with Kepler 
data, and for Kepler to reach its broader scientific po- 
tential, the challenges of removing aperture photometry 
systematics must be met. 
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APPENDIX 
Appendices: Data analysis examples 

PyKE0 is a suite of python software tools developed to reduce and analyze Kepler light curves, TPFs, and FFIs. 
PyKE was developed as an add-on package to PyRAF - a python wrapper for IRAF0 that provides for a new tool 
development to occur entirely in the python scripting language. PyKE can also be run as a stand-alone program 
within a unix-based shell without compiling against PyRAF. 

We present three general examples of how to prescribe and mitigate source contamination and systematic artifacts 
within the Kepler data using the PyKE software. These examples provide guidelines for the reader to follow, but data 
users are encouraged to experiment with the tunable input parameters for each tool. The procedures outlined in this 
section will not be optimal for all science, and it will ultimately be up to the user to determine what does optimize 
their scientific return. Ultimately these tools provide the user with flexibility to tune pixel extraction and artifact 
mitigation to the scientiflc potential of individual target data. 

EXAMPLE 1: PIXEL IMAGES AND SOURCE CONTAMINATION 

Due to the typical angular size of Kepler photometric apertures and the relatively crowded flelds within the Kepler 
field of view, one cannot be certain whether astrophysical variability across a Kepler light curve comes entirely from 
the target star. The likelihood of source confusion around any given target is high. In order to resolve the sources of 
variability within a target mask, the archive user should examine the TPF file. For the purposes of this example, the 
archived Pre-search Data Conditioned light curve of KIC 2449074 is shown in Figure HI as rendered by kepdraw. While 
the PyKE tasks can be operated entirely through GUI-driven operation using the epar function on the command line, 
for the sake of clarity in these examples, we provide the command line task invocations within the PyRAF environment: 

kepdraw inf ile=kplr002449074-2009350155506J.lc . fits 

outf ile=kplr002449074-2009350155506J.lc .png datacol=PDCSAP_FLUX ploterr=n 
errcol=PDCSAP_FLUX_ERR quality=y 

The above command is asking that the PDCSAP_FLUX column in the archived FITS file kplr002449074- 
2009350155506Jlc.fits be plotted to a new file called kph002449074-2009350155506Jlc.png. Plotting of the 1-cr error 
bars from the PDCSAP_FLUX_ERR column will be suppressed, and timestamps with non-zero quality flags will be 
ignored. PyKE will request more parameters through the command line before creating the plot but users can take 
the default options. Experimentation will reveal that these additional parameters control the look and feel of the plot 
- e.g. colors, line widths, fonts, etc. 

This object shows regular, low amplitude dips in brightness every 4.9 days that, at face value, are suggestive of a 
planetary transit of the target star. Figure [9] shows a calibrated flux time series of each target mask pixel collected 
over Q3. This figure was produced with the PyKE task keppixseries: 

keppixseries inf ile=kplr002449074-2009350155506J.pd-targ. fits 

outf ile=keppixseries . fits plotf ile=keppixseries .png plottype=local filter=n 



^ http:/ /keplergo. arc. nasa.gov/PyKE.shtml 



^ http://iraf.noao.edu 
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Figure ini reveals unambiguously that the target star is not the source of the "transit" features. A background 
eclipsing binary star is situated 10 arcsec from the target star (2.5 pixels to the left of KIC 2449074 on the figure) 
and is leaking into the optimal aperture. By extracting the light curve manually using different pixels, one can either 
reduce the contaminating flux from the eclipsing binary or, alternatively, extract flux from the eclipsing binary. New 
mask files are created interactively using the kepmask tool: 

kepmask inf ile=kplr002449074-2009350155506J.pd-targ. f its maskf ile=maskl .txt 
plotf ile=kepmask.png tabrow=2177 iscale=linear cmap=bone 

kepmask inf ile=kplr002449074-2009350155506J.pd-targ. fits maskf ile=mask2 .txt 
plotf ile=kepmask.png tabrow=2177 iscale=linear cmap=bone 

The image associated with the 2,177th tiniestamp in the Target Pixel File is plotted over a linear intensity scale 
using the bone color lookup table. Users of the kepmask tool define a new aperture interactively by moving the mouse 
over a pixel. One press of the 'X' keyboard key selects a pixel for inclusion within the new aperture, a second press 
deselects the pixel. The aperture is stored by clicking the 'DUMP' button on the interactive GUI. The task kepextract 
is employed to sum the pixels without weights within the newly defined aperture: 

In each case a new mask was defined interactively using the method described in Appendix B. Figure ITOl plots the 
mask stored in the file maskl.txt, while in Figure [TT] plots the mask stored in file mask2.txt. The commands used to 
extract new SAP light curve from the TPF are: 

kepextract inf ile=kplr002449074-2009350155506 J.pd-targ . fits maskf ile=maskl .txt 
outf ile=kepextractl .fits 

kepextract inf ile=kplr002449074-2009350155506 J.pd-targ . fits maskf ile=mask2 .txt 
outf ile=kepextract2 .fits 

The resulting SAP light curves of a less-contaminated target star and the background eclipsing binary star can be 
found in Figures [1^ and 1131 Both were constructed using the kepdraw task: 

kepdraw inf ile=kepextractl . fits outf ile=kepextractl .png datacol=SAP_FLUX ploterr=n 
errcol=SAP_FLUX_ERR quality=y 

kepdraw inf ile=kepextract2 . fits outf ile=kepextract2 .png datacol=SAP_FLUX ploterr=n 
errcol=SAP_FLUX_ERR quality=y 

EXAMPLE 2: MITIGATING SPACECRAFT SYSTEMATICS BY RE-EXTRACTING TARGET PIXELS 

Preceding sections of this paper have provided qualitative motivation for replacing archived light curves with pho- 
tometry re-extracted from the Target Pixel Files. From the TPFs, a customized light curve can be extracted from a 
new aperture containing any or all of the pixels in the target mask using a combination of the PyKE tasks kepmask (to 
define the new optimal aperture) and kepextract (to construct simple aperture photometry across the newly-defined 
aperture). In this example we consider the quarter 2 data for KIC 8703536, applying a custom aperture to extract a 
light curve from the pixel images. This target provides a conspicuous example because the source is spatially extended. 
The archived mask and aperture were constructed unaware of this fact and we can predict that the archived light curve 
flux is undercaptured. Furthermore, the KIC indicates that the target lies close to several fainter stellar sources that 
might be contaminants to the archived light curve. 

The archived SAP light curve for KIC 8703536 is displayed in the top panel of Figure [TH This plot can be recreated 
using the PyKE task kepdraw. 

kepdraw inf ile=kplr008703536-2009259160929J.lc. fits outf ile=sap .png datacol=SAP_FLUX 
ploterr=n errcol=SAP_FLUX_ERR quality=y 

The target is the Seyfert 2 galaxy 2MASX J19471938+4449425 and it is expected to be relatively quiet in the 
Kepler bandpass at high frequencies. Nevertheless, the SAP light curve displays four features that we can identify as 
systematic in nature because they coincide with spacecraft events and occur within all Kepler target light curves at 
the same time to a lesser or greater degree. These occur after a spacecraft safe mode, a pointing to Earth for data 
transfer, and two spacecraft attitude tweaks. Each systematic event can be identified by referencing the data quality 
flags supplied within the light curve file and TPF. 

The photometric aperture that yielded the light curve in Figure [Hk and the individual pixel photometry are provided 
in Figure [151 This image can be replicated using the PyKE task keppixseries: 

keppixseries inf ile=kplr008703536-2009259160929J.pd-targ. fits 

outf ile=keppixseries . fits plotf ile=keppixseries .png plottype=global filter=n 

The file kph008703536-2009259160929Jpd-targ.fits is the archived Target Pixel File coupled to the archived hght 
curve. Two new files will be created - keppixseries. fits contains the tabulated data for each individual pixel light 
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curve, and keppixseries.png contains the requested plot. The parameter plot type— global requests that aU hght curves 
are plotted on the same photometric scale and the data plotted are not bandpass filtered to reduce the low frequency 
effects of differential velocity aberration. 

Our task is to reduce the systematic artifacts by extracting a new light curve from a more strategic choice of pixel 
aperture. The righthand panel of Figure [T5] contains the pixel image from one specific timestamp in the Quarter 2 
Target Pixel File of KIC 8703536. The green, transparent pixels represent a new custom photometric aperture defined 
using the FyKE task kepmask. An interactive image is called with the command: 

kepmask inf ile=kplr008703536-2009259160929J.pd-targ. f its maskf ile=mask.txt 
plotf ile=kepmask.png tabrow=2177 iscale=linear cmap=bone 

As described in Appendix A, in this example, the new aperture is stored in a filed called mask.txt, and the task 
kepextract is called for this target pixel file. 

kepextract inf ile=kplr008703536-2009259160929 JLpd-targ . f its maskf ile=mask . txt 
outf ile=kepextract .fits 

The light curve in the lower panel of Figure [TJ] is the target data re-extracted from the new aperture and plot, as 
before with kepdraw: 

kepdraw inf ile=kepextract . fits outf ile=kepextract .png datacol=SAP_FLUX ploterr=n 
errcol=SAP_FLUX_ERR quality=y 

While there remains some high frequency structure associated with the safe mode and potentially some residual 
low frequency noise related to differential velocity aberration, systematic artifacts are much reduced with the new 
aperture. Either the extended target is still not fully captured or we have introduced new systematic noise with the 
inclusion of new source contaminants. Optimizing the light curve further with additional aperture iterations is left as 
an exercise for the reader. 

EXAMPLE 3: SYSTEMATIC ARTIFACT REMOVAL USING THE COTRENDING BASIS VECTORS 

In this example we will reduce the systematic trends present in the quarter 3 SAP time-series of an eccentric binary 
star. As we proceed through the steps, note that each improvement requires a subjective decision based upon both 
foreknowledge of events recorded in the Kepler data quality flags and physical insight of the target in question. The 
SAP photometry of this target is plotted against barycenter-corrected time in Figure [161 This plot was made using 
the PyKE tool kepdraw: 

kepdraw inf ile=kplr003749404-2009350155506J.lc . fits 

outf ile=kplr003749404-2009350155506J.lc .png datacol=SAP_FLUX ploterr=n 
errcol=SAP_FLUX_ERR quality=n 

We fit the first two CBVs to the data using the kepcotrend task invocation: 

kepcotrend inf ile=kplr003749404-2009350155506 J.lc . fits 
outf ile=kplr003749404-2009350155506_cbv. fits 

cbvf ile=kplr2009350155506-q03-d04J.cbv.f its vectors='l 2' method=llsq iterate=n 

The llsq method requires kepcotrend to perform a linear least-squares fit and subtraction of the basis vectors upon 
the SAP data. No sigma clipping iterations are performed during the fit. The quarter 3 CBV file, kplr2009350155506- 
q03-d04Jcbv.fits, can be downloaded from the Kepler archive at MAST. The full content of the input light curve file 
is copied to the output file and a new column called CBVSAP_FLUX is appended to the FITS table containing the 
best-fit, CBV-subtracted light curve. The result is shown in Figure[T7]and yields an improvement over the photometric 
quality of the SAP light curve. The long-term trend has been greatly reduced, but there are still higher-frequency 
features that are most likely systematic, and the fit can be improved further. In particular, we would like to remove 
the broad dip that has been introduced between the first and second brightening events in the time-series. We will 
strive to obtain a correction where the heights of each event are identical. Performing another fit using five basis 
vectors with the following command yields the result shown in Figure 1181 

kepcotrend inf ile=kplr003749404-2009350155506J.lc. fits 
outf ile=kplr003749404-2009350155506_cbv. fits 

cbvf ile=kplr2009350155506-q03-d04J.cbv. fits vectors='l 2 3 4 5' method=llsq iterate=n 

This new result is a qualitative improvement compared to the two-CBV fit, but the solution is still not optimal. We 
would like to improve the fit to the thermal event at BJD 2,455,156.5 and also further flatten the structure occurring 
after BJD 2,455,145. We flt the SAP data again, this time using eight basis vectors. The plot is shown in Figure [T9l 
but the result appears to be less optimal than the 5 basis vector flt. Anomalous structure has very likely been added 
to the time series by the CBVs. One possible reason for the less than optimal solution is that eight basis vectors are 
over-fitting the periodic brightenings and adding new systematic noise to the intervals between them. To test this 
hypothesis we masked out light curve segments containing the large amplitude brightenings and fit the CBV to the 
remaining data. We employed the PyKE task keprange to define the masked regions in the time series: 
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keprange inf ile=kplr003749404-2009350155506_Llc . fits outf ile=keprange . txt 
column=SAP_FLUX 

This will plot the SAP_FLUX column data within the light curve file over time. Ranges in time can be defined 
by selecting start and stop times with the mouse and 'X' keyboard key. We masked four ranges in this example, as 
illustrated in Figure[20l and these ranges will be saved to a text file after clicking the 'SAVE' button on the interactive 
GUI. We performed the eight basis vector fit one last time, excluding from the fit the regions defined in Figure [20l 
again using the kepcotrend task: 

kepcotrend inf ile=kplr003749404-2009350155506 J.lc . fits 
outf ile=kplr003749404-2009350155506_cbv. fits 

cbvf ile=kplr2009350155506-q03-d04 Icbv.fits vectors='l 2 3 4 5 6 7 8' method=llsq 
iterate=n maskf ile=keprange .txt 

In Figure [21] we see the final version of the light curve. Systematic effects still remain, e.g. the thermal settling 
event is not totally removed but, subjectively, the data are much improved over the tim e series. The qua l ity of the 
corrected light curve here was considered adequate for the scientific analysis presented bv [Thompson et all (|2012l ). 
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Figure 1. Quarter 2 long cadence SAP light curve of the eclipsing binary star V1950 Cyg (KIC 12164751; IHornel l|2008l )'l. produced by 
the PyKE tool kepdraw. The most- prominent systematic effect in this light curve is the long-term decay in flux from the target which falls 
by 3% over the duration of the quarter. This drop is a consequence of differential velocity aberration pushing the under-captured target 
position across the fixed pixel aperture over time. 
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Figure 2. Example output plots from PyKE tools keppixseries and kepmask. (left) The plot is created from the target pixel files. The 
numbers along the axes identify the pixel column and row on the CCD. The pixels comprising the optimal aperture for the target have a 
gray background. No data are collected in the black pixels, and the white pixels, which do sometimes typically collect some of the target's 
flux, are the halo pixels (see Section 3 for definitions), (right) The axes labels "arbitrary flux" and "time" refer to the photometric time 
series plotted for each individual pixel. Each light curve in the gray and white pixels is plotted on an identical linear flux scale. The target 
flux is continuously being redistributed among the neighboring pixels of the optimal aperture as the quarter progresses. The plot on the 
right also shows the decline of target flux, which indicates the target moved within the optimal aperture pixels. The missing flux was 
redistributed to the pixel above, as seen in the left panel. The pixel light curves are for Quarter 2 long cadence observations of V1950 Cyg. 
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Figure 3. An example of eight cotrending basis vectors with the highest principle values, or contribution to systematic variability, from 
channel 50 over operational quarter 5. Each basis vector is on the same relative flux scale, centered about 0.0 e~ s~^. Basis vectors can 
be linearly-fit to a light curve and subtracted off to mitigate for systematic effects. The fit coefficients can either be positive or negative. 
They run from left-to-right, top-to-bottom, in order of significance. 
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Figure 4. Sixteen long cadence light curves chosen at random from quarter 5, channel 50. All light curves show some degree of correlation. 
The most obvious common feature is an increase in the flux level over the course of the quarter. This is the manifestation of differential 
velocity aberration. 



Kepler data guide 



15 




^ 5.40 
5.38 
«=5.36 
5.34 
(D 5.32 
CO 5.30 
O 5^28 
^ 5.26 

4.15 







4.10 

! 

4.05 

4.00 

^ 8.05 
^ 8.00 
01 7.95 

CO 

O 7.90 
1— I 

7.85 
6.05 



6.00 

) 

5.95 
).90 



<l|il>Miiiil<">"'*i|»ti«ii>»ll* Will lull 



8.20 



cog 

-,8 

'H8 

9 

7 9 
CO 9 

I 9 
Olg 

^9 
tH9 

1 

7 1 

'«>; 

O 1 

1 



7 4.495 
m 4.490 
I 4.485 
4.480 
O 4.475 
4.470 





02 



3.58 
3.56 
3.54 
'<D 3.52 
^ 3.50 
H 3.48 

1.110 
M 1.108 



I 



, 1.106 
1.104 



tH 1.102 

^ 3.102 
3.100 
3.098 
m 3.096 
3.094 

S 3.092 

3.090 
^ 5.40 
5.38 
«=5.36 
5.34 
U 5.32 
CO 5.30 
O 5.28 
'-I 5.26 






80 100 120 140 160 



80 100 120 140 160 



80 100 120 140 160 



80 100 120 140 160 



BJD - 2455200 



Figure 5. The sixteen quarter 5 SAP ligiit curves presented in Figure |4] after the best-fit CBV ensemble has been subtracted. 
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Figure 6. Light curves extracted from all single pixels within the quarter 4, 5 and 6 Target Pixel Files for KIC 9603833 (the symbiotic 
star StHA 169). Gray pixels comprise the optimal apertures that yield the archived light curves. The target point-spread function is 
distributed across neighboring pixels differently from quarter to quarter and hence the optimal aperture varies in size and shape from 
quarter to quarter. The amount of target flux falling outside of the optimal aperture is quarter-dependent. Data are not collected from 
the black pixels. The plots were created with the keppixseries PyKE task, using the plottype= global option. 
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Figure 7. Upper: The archived SAP quarter 4, 5 and 6 hght curves for StHA 169 (KIC 9603833). Each light curve was extracted from 
the optimal apertures (grey pixels) defined in Figure [B] Photometric discontinuities occur at each quarterly roll due to the redistribution 
of target flux over a new CCD (indicated by the blue bars for clarity), and the redefinition of the optimal aperture. Lower: Quarterly roll 
discontinuities are reduced by re-extracting the three light curves over all available pixels in the target masks. In this specific example, 
the redefined optimal aperture collects more of the target's flux without introducing significant contaminating flux from nearby sources. 
Extraction was performed using the PyKE kepextract task with the maskfile=aU parameter. Some further examples of pixel extraction are 
provided in Appendices A and B. 
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Figure 8. The archived quarter 3 PDCSAP light curve of KIC 2449074. The regular dips in brightness every 4.9 d resemble a planetary 
transit. 
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Figure 9. Calibrated time-series pliotometry for every pixel within the mask surrounding the quarter 3 target KIC 2449074. Unlike 
Figure [21 each light curve is plotted on an independent flux scale in order to identify if any background sources with respect to the brighter 
target appears. Source confusion with a background binary within the optimal aperture (gray pixels) is evident in pixel x=650, y=319. 
The source of the regular dips found in the PDCSAP light curve of Figure |8] is not the target that the mask was designed for. The source 
of the dips is a faint, background binary star on the left-hand side of the mask. 
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Figure 10. This pixel map image, created with the PyKE tool kepmask, is a typical flux image within the quarter 3 pixel mask of KIC 
2449074. The green region is a manually-defined photometric aperture that maximizes the signal from the target star. The new selection 
of pixels, as indicated in green, minimizes the contamination from a backgro und eclipsing binary within the target mask. When summed, 
the pixels within the new optimal aperture produce the light curve in Figure 1121 
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Figure 11. As for Figurc lTOl except the green region is a manually-defined optimal aperture tiiat maximizes tlie signal from tlie background 
binary star within the mask. The selected green pixels mini miz e the contamination from the target star. When summed, the pixels within 
the new optimal aperture produce the light curve in Figure [131 
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Figure 13. The quarter 3 SAP light curve for the bac kgro und binary within the target mask of KIC 2449074, constructed by summing 
the pixels within the optimal aperture defined in Figure [111 
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Figure 14. a) The archived quarter 2 SAP hght curve for the Kepler target KIC 8703536. Blue regions identify specific events adding 
conspicuous systematic noise to the light curve, as labelled within the white boxes, b) A version of the light curve, mitigated for systematics 
by a different choice of optimal aperture pixels. A recipe for creating this new light curve from the archived Target Pixel File is provided 
in Appendix B. 
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Figure 15. a) Individual ligiit curves from the pixels writhin the mas k c ollected for the quarter 2 target KIC 8703536. The optimal 
aperture used to construct the archived light curve of this target (Figure 1 14b ) is defined by the grey pixels. This figure was produced with 
PyKE task keppixseries, similar to Figure|2] b) A manually-constructed aperture is defined by the green pixels, overlaid upon one sp ecifi c 
postage stamp image contained within the archived Target Pixel File. The new aperture yields the light curve provided in Figure fT4b . 
This figure was produced with PyKE task kepmask, which allow the user to select pixels to construct a new optimal aperture. The selected 
pixels appear green over the postage stamp image of the target. 
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Figure 16. The quarter 3 long cadence SAP light curve of KIC 3749404. The long-term trend of increasing flux, 1% in amplitude, is 
most-likely caused by differential velocity aberration. A 5-d interval of thermal settling after an Earth point at BJD 2,455,156.5 stands 
out as a likely systematic feature over the astrophysical signal. Cotrending Basis Vectors can be employed to remove or reduce all the 
undesirable systematic effects. 
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Figure 17. A two-CBV fit to the archived quarter 3 SAP light curve of KIC 3749404. The upper panel of the plot shows the original SAP 
light curve in blue and the best linear least-squares fit of the two basis vectors in red. The lower plot contains the result of subtracting 
the basis vectors fit from the original light curve. While systematic effects have been minimized, some remain. For example, the poor 
basis vector fit to the data between the first and second maxima around BJD 2,455,115, shows that the systematics were not completely 
mitigated. 
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Figure 18. As for Figure [Tfl This time, we fit five CBVs to tiie quarter 3 SAP ligiit curve of KIC 3749404. Tlie systematics now appear 
to be mucfi reduced but tfiere are still some effects in tlie second lialf of the quarter that can be mitigated further (e.g. the poor fit to the 
thermal settling event around BJD 2,455,156.5.) 
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Figure 20. The flux level in the green highlighted regions is not typical of the rest of the light curve and posed problems for the eight 
basis vector least-square fit. In order to improve the CBV fit to the SAP light curve we ignored the regions highlighted in green during 
fit minimization. This figure contains the interactive environment of the keprange tool, developed to define discrete regions of time-series 
data. 
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Figure 21. The final iteration of the CBV fit to the quarter 3 SAP light curve of KIC 3749404. We used the PyKE tool kepcotrend and 
fit eight basis vectors to the light curve. We did not fit the regions of the light curve highlighted in Figure [20l 



